1 |
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Guided Curriculum Learning for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Arabic Speech Recognition by End-to-End, Modular Systems and Human ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
On Prosody Modeling for ASR+TTS based Voice Conversion ...
|
|
|
|
Abstract:
In voice conversion (VC), an approach showing promising results in the latest voice conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) model to transcribe the source speech into the underlying linguistic contents; these are then used as input by a text-to-speech (TTS) system to generate the converted speech. Such a paradigm, referred to as ASR+TTS, overlooks the modeling of prosody, which plays an important role in speech naturalness and conversion similarity. Although some researchers have considered transferring prosodic clues from the source speech, there arises a speaker mismatch during training and conversion. To address this issue, in this work, we propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP). We evaluate both methods on the VCC2020 benchmark and consider different linguistic representations. The results demonstrate the effectiveness of TTP in both objective and ... : Submitted to ASRU2021. Under review ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2107.09477 https://dx.doi.org/10.48550/arxiv.2107.09477
|
|
BASE
|
|
Hide details
|
|
7 |
Leveraging Pre-trained Language Model for Speech Sentiment Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
End-to-end ASR to jointly predict transcriptions and linguistic annotations ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Differentiable Allophone Graphs for Language-Universal Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021 ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings
|
|
|
|
In: CHiME 2020 - 6th International Workshop on Speech Processing in Everyday Environments ; https://hal.inria.fr/hal-02546993 ; CHiME 2020 - 6th International Workshop on Speech Processing in Everyday Environments, May 2020, Barcelona / Virtual, Spain (2020)
|
|
BASE
|
|
Show details
|
|
14 |
A Comparative Study on Transformer vs RNN in Speech Applications ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Towards Online End-to-end Transformer Automatic Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
|
|
|
|
In: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-01744021 ; Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Sep 2018, Hyderabad, India (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Analysis of Multilingual Sequence-to-Sequence speech recognition systems ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Language model integration based on memory control for sequence to sequence speech recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|